- Sifter - Turn a folder of documents into typed records you can query
Sifter - Turn a folder of documents into typed records you can query
Sifter MCP Server
Turn a folder of documents into a database your agent can query.
RAG is great at finding a passage. It can't answer the questions people actually ask about a pile of documents — "how many invoices are unpaid", "total billed to this client this year", "which contracts expire in the next 90 days". Those are aggregations over the whole collection, and top-k retrieval only ever sees a handful of docs.
Sifter takes a different path: it extracts every document into a typed record (you describe the fields in plain language, the schema is inferred), then exposes them over MCP so your agent can query and aggregate them — exact counts, sums, filters, group-bys — with every field cited back to its source page. Not a paragraph. A figure.
What the agent can do
- Create a sift — define an extraction in natural language (e.g. "from invoices: client, date, total — skip anything that isn't an invoice").
- Upload documents — PDFs, scans, contracts, receipts, images.
- List & filter records — typed fields, real filters.
- Aggregate — counts, sums, group-bys over all records, not a sample.
- Get citations — trace any value back to its source document, page, and bounding box.
Connect
Remote (hosted, zero install — Starter+)
{
"mcpServers": {
"sifter": {
"url": "https://api.sifter.run/mcp",
"headers": { "Authorization": "Bearer sk-..." }
}
}
}
Get an API key at sifter.run → API Keys. The remote endpoint is a
Starter+ feature; free-plan keys receive 402 on tool calls.
Local (self-host, free, MIT — bring your own model)
{
"mcpServers": {
"sifter": {
"command": "uvx",
"args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
"env": { "SIFTER_API_KEY": "sk-..." }
}
}
}
Run the open-source engine with docker compose up -d and point the server at your instance.
Local models work — the LLM is only the extractor, so nothing has to leave your machine.
Try it
"How much have we invoiced per client this year, highest first?" "What's the total unpaid across all invoices?" "Which contracts expire in the next 90 days?"
Each runs as a real query over every record and returns an exact answer, traceable to the source.
Links
- Repo (open source, MIT): https://github.com/sifter-ai/sifter
- Docs: https://docs.sifter.run
- Hosted: https://sifter.run
Tags: document-extraction · structured-data · rag · pdf · ocr · data · agents · invoices · self-hosted
Server Config
{
"mcpServers": {
"sifter": {
"command": "uvx",
"args": [
"sifter-mcp",
"--base-url",
"https://api.sifter.run/api"
],
"env": {
"SIFTER_API_KEY": "sk-..."
}
}
}
}Recommend Servers
View AllA Serper MCP Server
test
高德地图官方 MCP Server